Project

This project attempts to reproduce the graph and table from FiveThirtyEight’s The Best NBA Players, According To RAPTOR

Load necessary packages

I used pacman for this as it cleans up code and installs and loads the needed packages.

Load data

The first step to reproducing the table and graph is to download the data. We use two datasets: one for RAPTOR information by player and one for RAPTOR information by team. Both datasets can be found on FiveThirtyEight’s GitHub. We use the datasets labeled at ‘latest’ for these figures.

Data Wrangling

The datasets we loaded need to be joined together so the first step is to create the dataframe that we need to reproduce both of these figures by using a left join and selecting only the relevant variables. The article says that they included players who played in at least 70% of their team’s games and averaged at least 24 minutes per game. There was no data on what percentage of games each player played in, so the filtering is slightly off and results in extra rows and extra data points.

Reproducibility Comment

While perform basic data wrangling, I noticed that the table that FiveThirtyEight created has a column for player positions. However, the datasets provided do not contain information on what position each player played. This is something that would need to be manually researched and added in. Therefore, we can recreate much of the original table, but cannot add in this column.

Additionally, there is data on the teams that each player played for; however, it is abbreviated, so to get the column to look like the article, I had to manually research and add in the unabbreviated version. The data does not contain the right information.

Another issue that I faced was that FiveThirtyEight’s figures are interactive. I, unfortunately, could not figure out how to make my reproduced figures as interactive, so I had a specific filter that I set everything to and recreated those figures.

Note: To reproduce this, please set FiveThirtyEight’s filters to the following:

  1. Season: ’22-’23
  2. Season Type: Full Season
  3. Minimum Minutes Played: 1137
  4. Team: All Teams
  5. Position: All Positions
  6. Years of Experience: Any number

Reproduce Table

Comments on constructing the table

I used reactable to recreate the table shown in the article. I had to rename some of the columns, add in the unabbreviated version of the team names, and reformat the minutes played and season columns.

Some Notes (Table)

  1. Make sure to swipe to see all the columns.
  2. Rounding and some formatting is a bit off due to my lack of experience using reactable.
  3. Number of rows does not match possibly due to lack of information on filtering for 70% of games played.

Reproduce Scatterplot

Comments on constructing the scatterplot

For this I used the plotly library and I used the dataset I constructed for the table above.

Some Notes (Scatterplot)

  1. Again, some of the formatting is slightly off due to my lack of experience using plotly

Remarks on Reproducibility

I was able to reproduce the table and graph to look about the same as FiveThirtyEight; however, I was unable to get the correct number of rows/players in the table which results in extra data points in the graph. The article was not too clear on how they filtered for players who played in at least 70% of their teams games. I also was not able to reproduce what position was played by each player as I did not have that information and there is not available dataset on the internet that would have been easy to bring in. Overall, I had most of the information I needed and was slightly off with my figures due to either lack of data or lack of experience with the libraries.